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Abstract 

Catharanthus roseus (Madagascar periwinkle) is a well-known plant with high medicinal value. 
Traditional methods of crops or plants improvement like conventional breeding (classical 
breeding) program have been largely limited by self-compatibility and heterozygosity. Recently, 
DNA based molecular marker-assisted breeding has increased the speed of breeding program; 
reduce the manpower as well time to develop elite varieties. Although the development and 
application of large-scale markers has been reported in Catharanthus roseus, but till now, Intron- 
length polymorphim markers (ILP) was not reported. For the development of intron length 
polymorphism (ILP) markers, 22,867 EST sequences were retrieved from NCBI database and pre- 
processed. The overlap sequences were identified and only single and coatings sequences were 
used in the study. ILP markers were designed by comparing the EST sequences with the available 
genomes of model plants. Approximately 38 primer pairs were designed for the C. roseus from 
flanked potential intron positions. The BLASTx (EST) analysis of 22,867 express sequence tags 
suggested that the 55.5% function for ESTs sequence edit them into 2 different functional 
categories. The developed ILP primers representing the different genic regions of C. roseus and 
will be able to identify polymorphic characters and gene position, diversity analysis and study of 
transferability. 

Keywords- Catharanthus Roseus, Intron-Length Polymorphism (ILP), molecular marker, 


genome, ESTs, PIP database 
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INTRODUCTION 

The Catharanthus roseus is an important medicinal plant in the Apocynaceae family and it is 
widely used as a source of chemotherapeutic drugs. This family mainly contains herbs and small 
shrubs. It has smooth marginal leaves, the flowers are found in leaf axils, which are born 
separately or in pairs on very short stems, and it has another distinctive feature which is the potent 
milky sap (1). 

Catharanthus roseus grows to a height of 20-60 cm, the flowers of this plants are pink, white or 
rosy-purple. The flowers have a base tube 2.5-3.0 cm long, about 2.0-5.0 cm in diameter, with five 


petal- like lobed. There are 5 sepals, 2-6 mm long, narrow, usually with pubescent. 


Fig1- Picture of Catharanthus roseus 


Flowering is usually happening during the summer months (March to May months). Very high 
temperature is not suitable for flowering. Leaves are simply oppositely arranged on the stem, with 
the entire margin if the leaves, also the plants have an imperceptible and indistinguishable fruit. 
These Plants are either propagated by seeds or by vegetative methods. Room temperature (25°C), 
dark conditions and low water are the suitable conditions for the germination of the seeds (2,3). 

The flowers of Catharanthus roseus are pollinated by butterflies and moths, this species is self- 
compatible, although under normal conditions self-pollination is rare. This plant also can be use as 
ornamental plant in garden and homes across warmer places, or can be grow in glasshouse 
throughout cold season. Catharanthus roseus, better known as the Periwinkle of Madagascar, is 
native to the island of Madagascar in the Indian Ocean. Madagascar is located on the east coast of 
South Africa. The Periwinkle is a perennial plant that is very common in tropical and subtropical 


forests (2, 4). 
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Till date several medicinal plants were already identified around the globe. However it was 
reported that C. roseus is one of the most important medicinal plants due to avainlability of more 
than 200 secondary metabolites. Every part of plant (stem, root, leaf) are highly useful. They are 
the rich source of Alkaloids (TIAs). In addition to alkaloids, Catharanthus roseus produces a wide 
range of phenolic compounds, including C6C1 compounds such as 2,3-dihydroxybenzoic acid, as 
well as phenylpropanoids, such as cinnamic acid derivatives, flavonoids. The formation of these 
compounds in C. Roseus is reconsidered, as well as their biosynthesis and regulation of the path. 
Both types of compounds compete with the biosynthesis of indole alkaloids (4-6). 

Introns are very important building blocks of any genomes and scattered throughout the genome. 
These are the non-coding sequences present in the gene that are transcribed, but removed during 
the pre-processing. For example, introns make up ~25% of the human genes, respectively (7). In 
general, introns have little functional significance, although some insertions can affect levels of 
gene expression. Consequently, introns are more variable than coding sequences. 

In plant, fluctuations or polymorphisms can be identified by genetic markers (DNA based marker) 
with the help of PCR based technologies, which are very useful tools for genetic research (for 
example, building genetic maps, defining genes or locations for quantitative properties). With the 
help of DNA markers and mapping population, several genetic maps were already developed in 
several crops (8, 9-12). On the basis of sequencing technologies and development of new tools, 
several DNA based markers have been developed, such as microsatellite or simple sequence 
repeat (SSR), Insilco based markers, single nucleotide polymorphism (SNP) and ILP marker etc. 
(12-18). 

Variation in Intron sequences can also be used to detect the polymorphism. They have been used 
successfully in mapping research genetics and population genes. It can be easily detected by PCR. 
To amplify introns by PCR, primers can be designed from flanking exons. This approach is called 
exon-primed intron-crossing PCR (EPIC-PCR). This approach provided a new method to identify 
and amplify the DNA sequences. It was reported previously that exon sequences are highly 
conservative; due to this unique character primers designed from these flanking sequences will be 
highly use full in several studies. 

Because of their unique properties, ILP markers are unique because they are gene specific, co- 
dominant, hyper-variable, neutral, convenient and reliable. In addition to the sequence tagged sites 


markers, ILP markers also have transferability ability to amplify adjacent plant species. 


To facilitate the direct advancement of ILP markers (13, 17) developed an online database called 
PIP (Potential Intron Polymorphism) to provide detailed information on various types of markers, 


indicators and homologous relationships. Despite these advantages, no reports are available till 
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date on the uses of ILP markers in medicinal important C. roseus plant. The primary objective of 
the present study was to development of ILP from the publicly available Catharanthus ESTs and 


their Characterization. 


Materials and Methods 

Molecular markers are most popularly used for estimation of polymorphisms, relatedness & 
mating system parameters, genotype characterisation in medicinal plants. 

Hence, ILP was developed for Catharanthus roseus. Therefore, the ILP markers were developed 
for the same. 

Identification of EST sequences 

First, the Catharanthus roseus EST sequences were retrieved in FASTA format from the NCBI 
(https://www.ncbi.nlm.nih.gov/) i.e. National Centre for Biotechnology Information advanced 
science & health by providing access to biomedical & genomic information. 


> 


Open the web browser & enter NCBI in the query, & click on the first link for the NCBI. 


> 


Enter Catharanthus roseus in the query box of the page & select Nucleotide from the 


database, further click on the search button. 


> 


Click on EST from the search result, to select ESTs. 


> 


Go down on the page & click on Send file & select File, then select FASTA in the format 

& accession in sort by, & click on create file. 

~ Hence the EST file of the desired EST is developed and downloaded. 

Pre-processing of the sequences 

The pre-processing was done for the FASTA sequences retrieved from the web, with the help of 
the web server of the software named as CAP3 (http://doua.prabi.fr/software/cap3) to identify the 
unique EST sequences. 

The Cap3 algorithm computes overlaps between sequences & then joins the reads in decreasing 


order of overlaps to form Contigs. 


> 


Opened http://doua.prabi.fr/software/cap3 & entered the sequences from the FASTA file 
retrieved from NCBI Database. 


> 


“Cap3 gave the two files after the pre-processing, that is Contigs & the Single tone 


sequences & therefore, the further processes were done separately for these.” 


Selection of candidate EST sequences 
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Basic Local Alignment Search Tool (BLAST; https://blast.ncbi.nlm.nih.gov/Blast.cgi) was used 
to identify the specific functions of Non-redundant ILP containing EST sequences. On the basis 
of blast hit, the homologous genomic regions were identified. BLAST searches were performed 
to provide complete coverage across the genome sequence. 

Open https://blast.ncbi.nlm.nih.gov/Blast.cgi, & enter the sequences from the file. 


> 


Submit the sequences, by setting the desired parameters. 
> 


Among the various hits, selected the one with the most query coverage, identity & 


highest scores. 


Data Analysis 
The data collected from the CAP3 & BLAST were than analysed fir the identification and the 


characterization of ILP. 


EST sequences 


NCBI adbEST (ft i ie 


Unigene definition 
CD-HIT (ClusterDatabase at High Identity 
with Tolerance) 


[ Development of specific intron-based markers | 
PIP database (http://ibi.zju.cdu.cn/pgel/pip/) 


Designed a pair of primers flanking the 
intron position 


BlastX analyisis | 


https://blast.ncbi.nim.nih.gov/Blast.czri 


L 


Biological, Cellular and Metabolic function 
primer 


Fig 2- Flow Chart of Methodology 
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Results and Discussion 


EST assembly 
We downloaded a set of 22,867 ESTs from Catharanthus roseus EST-database available at NCBI: 


https://www. ncbi.nlm.nih.gov/. 


Fig 3: EST Sequences of Catharanthus roseus retrieving from NCBI Database (17-07-20) 


Pre-processing of sequences 
EST sequences of Catharanthus roseus were downloaded from ftp://ftp.ncbi. nih.gov/blast/db/. 


Approximately 22,867 sequences were pre-processed to remove the overlapping sequences. 
Around 18,992 singletone and 1127 contings were identified and selected by CAP3 software (12, 


19). Further these sequences were used to identify SSR containing sequences. 


Singletone 


Fig 4 - EST sequences of C.roseus 
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Se PRABI-Doua 
prabt Péle Rhéne-Alpes de Bioinformatique Site Doua 


CAP3 Sequence Assembly Program 


Enter your sequences in rAsta format (no more than SO kb): 

eee a we TIMI SIs t0F a : 

P t3de ra @us O2:407430, ORMA sequence 
SOCAATCAATTIC§CAAATOOCT ICCA 


po 


STCACCTCCACCAA 


Che EK hahah ha Th Mahle LATE Sink BET 6K Em 
SUBMIT CLEAR 
WS you Lo abeernible # set af contiguous sequences (corticn) with the CAP9 program 


If you use CAP in any published work, please cite Dye followin ference 
Huang, x, and Macan, A (1995) CAP3: A DNA sequence assem program. Genome Res. 9, 268 
For a more advanced usage of CAPS, it ls recorenanded to install the original software on os 


77 
an local computers 


Fig 5- Prabi CAP3 (17-07-20) 
ILP mining and primer designing 
Unique sequences were processed for developing ILP primers flanking introns using 
Catharanthus roseus singleton genomic sequences. PIP identifies exon intron boundaries and 
predicts suitable primers flanking intronic regions (Table 1). 
Characterization of Primers 
Developed ILP primer pairs were characterized by using BLASTX searched and 
analysed. A cut-off bit score of GC % content above 50% and an E-value of le-05 
were considered optimum for BLASTX analysis. 
The supposed functions of the ILP markers are assigned to perform the BLASTX 
search for the corresponding markers that contain the EST strings in the NCBI 


database(http://blast.ncbi.nlm.nih.gov/Blast.cgi) with standard search parameters. 
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Table 1 List of primers and their amplification characteristics 


S.No.| ID Tm GC% Forward primer Reverse Primer 
CC) 
GACAACCACTTAGTTTTGGT | CCAGGAACAGTGTTTTTGT 
1 AT4G37930 | 58 39 GAA TAGC 
TCTCCAGAAGTGGCACAGG | TCTTCCCAATGCAATGAAT 
2 AT1G23490 | 60 55 T G 
ATTTGCTTCAGGACTCTTAA | AATTGAGCTGACCAGGGA 
e AT5G12250 | 57 33 ACTT AC 
GAGGATTCATGTTCTTCAGT| AAAGAACTTAAAGCTGAG 
4 | AT5G54680 | 60 43 TGC AAAAAC 
GCAATTAAAGTCAAGGCAT | AAGACGAGGGATGACAAC 
5 AT2G36580 | 58 43 CC AGA 
TCCTTCCTTGAGGATACATC | AGTCAAGGTGAGGAGGTG 
6 AT5G17770 | 58 39 TIT ATT 
AATCACCTCCTCACCTTGAC | TCAGCCATAATGTAGCAAA 
q AT5G17770 | 57 47 T GTTTA 
GGGTGGCTTTTTAATTTTCA | GAAGAACTTTAGTTCCACC 
8 AT4G39910 | 60 34 TTC GAGA 
GACAAGAAGGAATTCGCTC 
9 AT1G12240 | 59 40 TCCTTGGCATGGAATTTTTC | AA 
AGCCACCAGCAACAAGAG 
10 | ATS5G25150 | 62 63 CGGCCGAGGTACAGCACTA | AT 
11 | AT1G13950 | 61 56 CGGCACCATCCGTAAGAAC | ACTTGGCATGACCGTGCT 
CTGACTGTAATTTCGCAGA | GGATGGCTCTGGTGTTCTT 
12 | AT2G39390 | 59 45 AGC Cc 
AAGCTCATGTTTGGCTTGC 
13. | AT1G51160 | 59 50 ATGAACACCCTTGTCCAGGT | T 
TAAATATCAATGCCCTCTGG| TGAAGAACAAATTTGGCTT 
14. | ATS5G16960 | 59 35 AAA TGAT 
GCCGAGGTACCTTCTCTGA | TGGAGTGTGAAGGCAATTA 
15 | AT1G76630 | 59 55 A GG 
ATGATGTCCAGTTCGCACA | TTGGTGGGAGTTGACAATG 
16 | AT2G19790 | 59 50 C A 
17. | AT2G19790 | 59 45 TTCATTGTCAACTCCCACCA | CTCGCACCGAACAACAGTG 
TGCACCTGATCTGAAATACT| TTGTACCGTTGACAGTTGG 
18 | AT4G26900 | 59 39 TTG TG 
TTACGGGTTTTCGAGACTT 
19 | AT4G26900 | 59 61 CCCCCACCAACTGTCAAC CC 
AACCTTGGCAAGCCAGTAG 
20 | AT4G26900 | 60 60 GTCCCCAAGAGGGAAGTCTC| A 
GGAGAGGTGAAATGTGCTC 
21 | AT4G35450| 60 a ¥ ACTCCTTGCGTCCGTAACC | AG 
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Table 2 Characteristics of EST-derived ILPs for Catharanthus roseus. 


Putative fusccos 
accession ne. 
a 


(aia aia 1 ll a 020327 [Oryza a eee | 


Pe ee 
italica] 
| EGS62736.1 | 7e-24 ——— 049086 [Papaver 
ee ee eee 
sativa] 
ATIG76630 | £G362602.1 —— 
ica aaa (al 


EG562578.1 la protein CISIN_1g018578mg [Citras 
sinensis] 


AT1G69620 | CX119705.1 Select seq refiXP_004238799.1| 605 ribosomal proteim 
L3# a lycopersicum] 
a aad | | | 
S| a | 
AT4G17300_—s| EG562465.1 asparagins—tRNA ligase, chloroplastic/mitochondial 
[Solanum pennelhi] 
— essere — ~ 
2G562407.1 pataive 605 ribosomal protein L18-1 [Hibiscus 
syniacus] 
EG362374.1 chaperonsproteindna] 1 5-like[Nicotiana 
— 
EG$62271.1 rab9 effector proteim with kelch motifs isoform X1 
——— —s 


EG562232.1 
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Data mining for development of ILP markers 
Around 22,867 ESTs sequences of C. roseus were extract from the EST database, 


which were collected in 20,119 unique sequences (18,992 Contigs and 1127 
singletons) and identified 38 ILP primers. In addition, ILP indicators were extracted 
from the PIP database. Thus, the PIP database served as a potential resource marker for 
mining - development of new markers; The remaining strings could not be used to 
design ILP markers, as they were not able to meet the initial design criteria. 

As introns are a non-coding DNA sequence in genes that are not functionally 
important in plant metabolism, although some genes can affect the level of gene 
expression. In addition, in the genome they are more variable than the coding 
sequences due to the total selective pressure in intense regions, much less than in 
exonic regions (13, 17). Although Huang et al. (16) showed that ILP markers are more 
versatile than other markers, but there are very few polymorphic ILP markers. In 
addition, these markers have fewer crosses in wild relatives compared to ILP markers. 


Thus, these ILP markers are suitable for characterizing wild relatives. 


Functional annotation of ILP 

BLASTX analyses of the 2,867 EST sequence took on almost many defined functions 
for ILP markers, and some had nothing in common with previously sequenced genes. 
The function-based ILP markers were grouped into five main categories with defined 
function (55.5%) (Figure 2). The largest category (37.2%) contained EST sequences 
with hypothetical /undeclared/ putative functions. The second largest class (22.6%) 
included the photosynthesis gene. The stress-related gene (12.5%) ranked third, 
followed by defence (11.2%), followed by secondary metabolism (8.1%), 
transformation factors (6.1) and primary metabolism (2, 3%) (Figure 2). Catharanthus 
roseus 18 a potentially abiotic stress-tolerant plant, particularly for drought salinity, 
studying EST sequences with hypothetical / undeclared / putative functions (37.2%) 
and with those that had no resemblance to previously sequenced genes (45, 5%) can 


introduce new details of stress tolerance mechanisms. 
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Functional Classification 


Functional Classification 


m@ Primary Metabolism 


® photosynthesis related 
gene 
Stress Induced 


®Secondatry meatbolism 


m Defence Related 


Fig 6- Functional classification of ILP markers containing ESTs/genes. The unique 


genes were grouped into seven functional groups. 


Biological and Molecular Function: 
The putative function of the ESTs was attributed to the BLASTx analysis to classify 


them in two groups based on homology: (a) biological and (b) molecular similar to 
previous study (12). The significant value of this work is that ILP have shown 
homology as a target function, demonstrating a good approach for using these ILP sites 
as a molecular marker to saturate primary and secondary metabolic pathways in plants 
(13). The polymorphism analysis and transferability of the ILP markers showed the 
value of the developed indicators. The potency of the ILP developed between species 
provides a good opportunity to study unknown medicinal plants. The high affinity for 
these initial pairs, polymorphism and transferability suggests that the markers 
developed in this study are very useful in auxiliary marker selection, genetic diversity 
studies, link mapping and comparative analysis. The work presented here complements 
efforts to develop a well-matured molecular map of C. roseus. Recently developed 


ILPs are informational studies and are a valuable source of gene-based ILP markers. 
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Biological Process 


mit-RNA Synthesis 

13% 8% 
7% 
Chitin Catabolic 

iy process 


mw Cell wall macro 
molecule catabolic 
process 


Molecular Function 


@ DNA Binding 


@ RNA Binding 


®@ Metal-ion Binding 


Fig 7. Classification of EST-ILPs on the basis of significant match with putative 


proteins (A) Biological putative proteins and (B)Molecular Function proteins. 


Conclusion 
Catharanthus roseus, commonly known as "periwinkle (evergreen)", is a highly 
studied plant due to its medicinal properties. The secondary metabolites produced by 


this plant are antileukemic (vincristine, vinblastine) and antihypertensive (ajmalicin 
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and serpentine) (20). However, extremely low yields prevent the widespread use of 
these alkaloids for therapeutic purposes. Molecular marker-based techniques were used 
to map high-yielding varieties. Currently, [LPs are evolving as a powerful class of 
molecular markers due to their availability, hyper variability, high availability analysis, 
high polymorphism, portability compared to other relevant availability indicators. 
Currently, ILP markers are used in the selection high-yielding varieties, molecular 
mapping and analysis of quantitative properties. Apart of these unique properties the 
development of ILP markers are also required lower cost, less time and highly 
informative (13,17). Although the development of ILP markers has been reported in 
some plant species (13,17) but in Catharanthus, this is the first report of development 
of ILP primers. During this study, 38 primers were developed and characterized. 

The ILP primers developed from C. roseus, can be used for everyone to characterize 
the identification of a Catharanthus species and can be used to characterize and 
identification of C. roseus genotype, genetic diversity, genetic improvement and 
molecular DNA deformation test of Catharanthus species. In addition, the high level 
of transmissibility of their cross-breed species will increase our understanding of the 
penetration of genes, evolutionary relationships, among wild relatives of Catharanthus 


species. 
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